Extracting Wikipedia Historical Attributes Data
نویسندگان
چکیده
In this paper, we describe the collection of a large structured dataset of temporally anchored relational data, obtained from the full revision history of the English Wikipedia. By mining (attribute, value) pairs from this revision history, we are able to collect a comprehensive, temporally-aware knowledge base that contains data on how attributes change over time. We discuss different characteristics of the extracted dataset, which is freely distributed for further study.
منابع مشابه
WHAD: Wikipedia historical attributes data - Historical structured data extraction and vandalism detection from the Wikipedia edit history
This paper describes the generation of temporally anchored infobox attribute data from the Wikipedia history of revisions. By mining (attribute, value) pairs from the revision history of the English Wikipedia we are able to collect a comprehensive knowledge base that contains data on how attributes change over time. When dealing with the Wikipedia edit history, vandalic and erroneous edits are ...
متن کاملExtracting and Visualising Biographical Events from Wikipedia
This work presents a proposal for the development of a natural language processing module for event and temporal analysis of biographies as available in Wikipedia. At the current level of development, we restricted the extraction to temporally anchored events as they represent salient information which can be further used to extract additional events and facilitate their chronological ordering ...
متن کاملIdentifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes
An approach for named entity classification based on Wikipedia article infoboxes is described in this paper. It identifies the three fundamental named entity types, namely; Person, Location and Organization. An entity classification is accomplished by matching entity attributes extracted from the relevant entity article infobox against core entity attributes built from Wikipedia Infobox Templat...
متن کاملA Two-Step Approach to Extracting Attributes for People on the Web
Personal names are among one of the most frequently searched items in web search engines. Extracting information in the form of attributes and values for a particular person enables us to uniquely identify that person on the web. For example, although namesakes share the same name they usually have different date of births or affiliations. Given a set of documents retrieved for a particular per...
متن کاملAutomatic Classification and Relationship Extraction for Multi-Lingual and Multi-Granular Events from Wikipedia
Wikipedia is a rich data source for knowledge from all domains. As part of this knowledge, historical and daily events (news) are collected for different languages on special pages and in event portals. As only a small amount of events is available in structured form in DBpedia, we extract these events with a rule-based approach from Wikipedia pages. In this paper we focus on three aspects: (1)...
متن کامل